Time-Space Scheduling in the MapReduce Framework

نویسندگان

  • Zhuo Tang
  • Ling Qi
  • Lingang Jiang
  • Kenli Li
  • Keqin Li
چکیده

As data are the basis of information systems, using Hadoop to rapidly extract useful information from massive data of an enterprise has become an efficient method for programmers in the process of application development. This chapter introduces the MapReduce framework, an excellent distributed and parallel computing model. For the increasing data and cluster scales, to avoid scheduling delays, scheduling skews, poor system utilization, and low degrees of parallelism, some improved methods that focus on the time and space scheduling of reduce tasks in MapReduce are proposed in this chapter. Through analyzing the MapReduce scheduling mechanism, this Some significant language editing was done. please check. conTenTs 7.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phurti: Application and Network-aware Flow Scheduling for Mapreduce

Traffic for a typical MapReduce job in a datacenter consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can reduce the average job completion time. However, most of them treat the core network as a black box with ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

I/O Throttling and Coordination for MapReduce

As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drop...

متن کامل

A Survey on MapReduce Performance and Hadoop Acceleration

MapReduce is implementation for generating large data sets with a parallel, distributed algorithm on a cluster. Hadoop is open source implementation of the MapReduce programming datamodel used for large-scale parallel applications such as web indexing, data mining, and scientific simulation. Hadoop-A framework is able to levitate Hadoop acceleration and give significant performance compared to ...

متن کامل

Availability and Network-Aware MapReduce Task Scheduling over the Internet

MapReduce offers an ease-of-use programming paradigm for processing large datasets. In our previous work, we have designed a MapReduce framework called BitDew-MapReduce for desktop grid and volunteer computing environment, that allows nonexpert users to run data-intensive MapReduce jobs on top of volunteer resources over the Internet. However, network distance and resource availability have gre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015